AITopics | target area

Collaborating Authors

target area

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ColorVisualIllusions: AStatistics-based ComputationalModel

Neural Information Processing SystemsFeb-8-2026, 18:51:40 GMT

However,neitherthedata nor the tools existed in the past to extensively support these explanations. The era of big data opens a new opportunity to study input-driven approaches. We introduce atool that computes the likelihood ofpatches, given alarge dataset to learn from. Given this tool, we present a model that supports the approach and explains lightness and color visual illusions in a unified manner.

artificial intelligence, illusion, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

TRACE: Textual Reasoning for Affordance Coordinate Extraction

Park, Sangyun, Kim, Jin, Cui, Yuchen, Brown, Matthew S.

arXiv.org Artificial IntelligenceNov-5-2025

Vision-Language Models (VLMs) struggle to translate high-level instructions into the precise spatial affordances required for robotic manipulation. While visual Chain-of-Thought (CoT) methods exist, they are often computationally intensive. In this work, we introduce TRACE (Textual Reasoning for Affordance Coordinate Extraction), a novel methodology that integrates a textual Chain of Reasoning (CoR) into the affordance prediction process. We use this methodology to create the TRACE dataset, a large-scale collection created via an autonomous pipeline that pairs instructions with explicit textual rationales. By fine-tuning a VLM on this data, our model learns to externalize its spatial reasoning before acting. Our experiments show that our TRACE-tuned model achieves state-of-the-art performance, reaching 48.1% accuracy on the primary Where2Place (W2P) benchmark (a 9.6% relative improvement) and 55.0% on the more challenging W2P(h) subset. Crucially, an ablation study demonstrates that performance scales directly with the amount of reasoning data used, confirming the CoR's effectiveness. Furthermore, analysis of the model's attention maps reveals an interpretable reasoning process where focus shifts dynamically across reasoning steps. This work shows that training VLMs to generate a textual CoR is an effective and robust strategy for enhancing the precision, reliability, and interpretability of VLM-based robot control. Our dataset and code are available at https://github.com/jink-ucla/TRACE

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.01999

Country: North America > United States > California > Los Angeles County > Los Angeles (0.29)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Color Visual Illusions: A Statistics-based Computational Model

Neural Information Processing SystemsOct-3-2025, 04:04:15 GMT

The era of big data opens a new opportunity to study input-driven approaches. We introduce a tool that computes the likelihood of patches, given a large dataset to learn from. Given this tool, we present a model that supports the approach and explains lightness and color visual illusions in a unified manner.

illusion, likelihood, visual illusion, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel (0.04)
North America > Canada (0.04)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Cooperative Target Detection with AUVs: A Dual-Timescale Hierarchical MARDL Approach

Xueyao, Zhang, Bo, Yang, Zhiwen, Yu, Xuelin, Cao, Alexandropoulos, George C., Debbah, Merouane, Yuen, Chau

arXiv.org Artificial IntelligenceSep-18-2025

Autonomous Underwater Vehicles (AUVs) have shown great potential for cooperative detection and reconnaissance. However, collaborative AUV communications introduce risks of exposure. In adversarial environments, achieving efficient collaboration while ensuring covert operations becomes a key challenge for underwater cooperative missions. In this paper, we propose a novel dual time-scale Hierarchical Multi-Agent Proximal Policy Optimization (H-MAPPO) framework. The high-level component determines the individuals participating in the task based on a central AUV, while the low-level component reduces exposure probabilities through power and trajectory control by the participating AUVs. Simulation results show that the proposed framework achieves rapid convergence, outperforms benchmark algorithms in terms of performance, and maximizes long-term cooperative efficiency while ensuring covert operations.

evolutionary algorithm, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2509.13381

Country:

Asia > China (0.47)
Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)
(2 more...)

Add feedback

Phi-Ground Tech Report: Advancing Perception in GUI Grounding

Zhang, Miaosen, Xu, Ziqiang, Zhu, Jialiang, Dai, Qi, Qiu, Kai, Yang, Yifan, Luo, Chong, Chen, Tianyi, Wagle, Justin, Franklin, Tim, Guo, Baining

arXiv.org Artificial IntelligenceAug-1-2025

With the development of multimodal reasoning models, Computer Use Agents (CUAs), akin to Jarvis from \textit{"Iron Man"}, are becoming a reality. GUI grounding is a core component for CUAs to execute actual actions, similar to mechanical control in robotics, and it directly leads to the success or failure of the system. It determines actions such as clicking and typing, as well as related parameters like the coordinates for clicks. Current end-to-end grounding models still achieve less than 65\% accuracy on challenging benchmarks like ScreenSpot-pro and UI-Vision, indicating they are far from being ready for deployment. % , as a single misclick can result in unacceptable consequences. In this work, we conduct an empirical study on the training of grounding models, examining details from data collection to model training. Ultimately, we developed the \textbf{Phi-Ground} model family, which achieves state-of-the-art performance across all five grounding benchmarks for models under $10B$ parameters in agent settings. In the end-to-end model setting, our model still achieves SOTA results with scores of \textit{\textbf{43.2}} on ScreenSpot-pro and \textit{\textbf{27.2}} on UI-Vision. We believe that the various details discussed in this paper, along with our successes and failures, not only clarify the construction of grounding models but also benefit other perception tasks. Project homepage: \href{https://zhangmiaosen2000.github.io/Phi-Ground/}{https://zhangmiaosen2000.github.io/Phi-Ground/}

arxiv preprint arxiv, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2507.23779

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Software (0.45)
Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Graphics (1.00)
Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(8 more...)

Add feedback

Decentralized Consensus Inference-based Hierarchical Reinforcement Learning for Multi-Constrained UAV Pursuit-Evasion Game

Yuming, Xiang, Sizhao, Li, Rongpeng, Li, Zhifeng, Zhao, Honggang, Zhang

arXiv.org Artificial IntelligenceJun-24-2025

--Multiple quadrotor unmanned aerial vehicle (UA V) systems have garnered widespread research interest and fostered tremendous interesting applications, especially in multi-constrained pursuit-evasion games (MC-PEG). The Cooperative Evasion and Formation Coverage (CEFC) task, where the UA V swarm aims to maximize formation coverage across multiple target zones while collaboratively evading predators, belongs to one of the most challenging issues in MC-PEG, especially under communication-limited constraints. This multifaceted problem, which intertwines responses to obstacles, adversaries, target zones, and formation dynamics, brings up significant high-dimensional complications in locating a solution. In this paper, we propose a novel two-level framework (i.e., Consensus Inference-based Hierarchical Reinforcement Learning (CI-HRL)), which delegates target localization to a high-level policy, while adopting a low-level policy to manage obstacle avoidance, navigation, and formation. Specifically, in the high-level policy, we develop a novel multi-agent reinforcement learning module, Consensus-oriented Multi-Agent Communication (ConsMAC), to enable agents to perceive global information and establish consensus from local states by effectively aggregating neighbor messages. Meanwhile, we leverage an Alternative Training-based Multi-agent proximal policy optimization (A T -M) and policy distillation to accomplish the low-level control. The experimental results, including the high-fidelity software-in-the-loop (SITL) simulations, validate that CI-HRL provides a superior solution with enhanced swarm's collaborative evasion and task completion capabilities. Nowadays, quadrotor Unmanned Aerial V ehicles (UA Vs) have demonstrated great potential in costly or human-unfriendly tasks (e.g., disaster response [1]), due to their agility, cost-effectiveness, and compact size. Nevertheless, the UA V swarm is likely to be exposed to an adversarial environment, where a hostile factor or agent might attack the affiliated members, and must respond promptly to boost the survival opportunity. Y uming Xiang and Sizhao Li and Rongpeng Li are with the College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China (email: {xiangym1999; liszh5; lirongpeng }@zju.edu.cn).

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2506.18126

Country: Asia > China > Zhejiang Province > Hangzhou (0.24)

Genre:

Research Report > Promising Solution (0.34)
Instructional Material > Course Syllabus & Notes (0.34)

Industry:

Information Technology (0.48)
Aerospace & Defense > Aircraft (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Tru-POMDP: Task Planning Under Uncertainty via Tree of Hypotheses and Open-Ended POMDPs

Tang, Wenjing, He, Xinyu, Huang, Yongxi, Xiao, Yunxiao, Lu, Cewu, Cai, Panpan

arXiv.org Artificial IntelligenceJun-4-2025

Task planning under uncertainty is essential for home-service robots operating in the real world. Tasks involve ambiguous human instructions, hidden or unknown object locations, and open-vocabulary object types, leading to significant open-ended uncertainty and a boundlessly large planning space. To address these challenges, we propose Tru-POMDP, a planner that combines structured belief generation using Large Language Models (LLMs) with principled POMDP planning. Tru-POMDP introduces a hierarchical Tree of Hypotheses (TOH), which systematically queries an LLM to construct high-quality particle beliefs over possible world states and human goals. We further formulate an open-ended POMDP model that enables rigorous Bayesian belief tracking and efficient belief-space planning over these LLM-generated hypotheses. Experiments on complex object rearrangement tasks across diverse kitchen environments show that Tru-POMDP significantly outperforms state-of-the-art LLM-based and LLM-tree-search hybrid planners, achieving higher success rates with significantly better plans, stronger robustness to ambiguity and occlusion, and greater planning efficiency.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.0286

Country: Asia > China (0.28)

Genre:

Workflow (0.68)
Research Report (0.50)

Industry: Consumer Products & Services > Food, Beverage, Tobacco & Cannabis > Beverages (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Rapid AI-based generation of coverage paths for dispensing applications

Baeuerle, Simon, Mendonca, Ian F., Van Laerhoven, Kristof, Mikut, Ralf, Steimer, Andreas

arXiv.org Artificial IntelligenceMay-7-2025

Coverage Path Planning of Thermal Interface Materials (TIM) plays a crucial role in the design of power electronics and electronic control units. Up to now, this is done manually by experts or by using optimization approaches with a high computational effort. We propose a novel AI-based approach to generate dispense paths for TIM and similar dispensing applications. It is a drop-in replacement for optimization-based approaches. An Artificial Neural Network (ANN) receives the target cooling area as input and directly outputs the dispense path. Our proposed setup does not require labels and we show its feasibility on multiple target areas. The resulting dispense paths can be directly transferred to automated manufacturing equipment and do not exhibit air entrapments. The approach of using an ANN to predict process parameters for a desired target state in real-time could potentially be transferred to other manufacturing processes.

artificial intelligence, dispense path, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2505.0356

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
Europe > Germany > North Rhine-Westphalia > Arnsberg Region > Siegen (0.05)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.05)
Asia > Singapore (0.05)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Add feedback

Can Reasoning Models Reason about Hardware? An Agentic HLS Perspective

Collini, Luca, Hennessee, Andrew, Karri, Ramesh, Garg, Siddharth

arXiv.org Artificial IntelligenceMar-16-2025

Recent Large Language Models (LLMs) such as OpenAI o3-mini and DeepSeek-R1 use enhanced reasoning through Chain-of-Thought (CoT). Their potential in hardware design, which relies on expert-driven iterative optimization, remains unexplored. This paper investigates whether reasoning LLMs can address challenges in High-Level Synthesis (HLS) design space exploration and optimization. During HLS, engineers manually define pragmas/directives to balance performance and resource constraints. We propose an LLM-based optimization agentic framework that automatically restructures code, inserts pragmas, and identifies optimal design points via feedback from HLs tools and access to integer-linear programming (ILP) solvers. Experiments compare reasoning models against conventional LLMs on benchmarks using success rate, efficiency, and design quality (area/latency) metrics, and provide the first-ever glimpse into the CoTs produced by a powerful open-source reasoning model like DeepSeek-R1.

large language model, latency, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2503.12721

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Channel Gain Map Construction based on Subregional Learning and Prediction

Chen, Jiayi, Gao, Ruifeng, Wang, Jue, Sun, Shu, Wu, Yi

arXiv.org Artificial IntelligenceFeb-5-2025

--The construction of channel gain map (CGM) is essential for realizing environment-aware wireless communications expected in 6G, for which a fundamental problem is how to predict the channel gains at unknown locations effectively by a finite number of measurements. As using a single prediction model is not effective in complex propagation environments, we propose a subregional learning-based CGM construction scheme, with which the entire map is divided into subregions via data-driven clustering, then individual models are constructed and trained for every subregion. In this way, specific propagation feature in each subregion can be better extracted with finite training data. Moreover, we propose to further improve prediction accuracy by uneven subregion sampling, as well as training data reuse around the subregion boundaries. Simulation results validate the effectiveness of the proposed scheme in CGM construction. To support the largely increased data demands and connection requirements, communication network is becoming more complex in the forthcoming 6G era [1]. This brings challenges to low-complexity network deployment optimization and transmission design [2]. Environment-aware communication provides a promising solution for this challenge, which requires communication-related environment information, also known as the channel knowledge map (CKM) [3], [4], as side information to be exploited when designing the system. In general, the CKM can be presented in terms of a site-specific database, which provides information of concerned channel parameters at given geometric locations. Depending on the particular channel information it conveys, different types of CKM have been studied, including the channel shadowing map [5], channel gain map (CGM) [6], and beam index map [7], etc. Different CKMs can be exploited for different tasks.

channel gain, sample point, subregion, (14 more...)

arXiv.org Artificial Intelligence

2502.15733

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback